Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

RNA-Seq Data Analysis ◾ 175

denominator, we will obtain the reads per kilobase per million or RPKM. The RPKM is

for single-end reads. However, for paired-end reads, both forward and reverse reads are

aligned, and thus, “fragment” is used instead of “read” and the normalized unit of gene

expression in this case is FPKM (fragment per kilobase per million).

5.3.5.2 Transcripts per Million

The transcripts per million (TPM) [25] is proposed as an alternative to RPKM and FPKM

to adjust for the bias of gene length and to be used for within-sample differential gene

expression. The TPM represents the abundance of reads aligned to gene i in relation to

the abundance of the reads aligned to other genes in the same sample. To normalize the

RNA reads counts, first, for any gene, divide the number of reads aligned to it by its length,

forming the count per gene length in base. Then, divide the count per length in base by the

sum of all counts (per length in base of every gene) and multiply by 1000,000 forming the

transcript per million or TPM.

∑

× 

^



^

TPM

10⁶

(5.2)

5.3.5.3 Counts per Million Mapped Reads

The counts per million (CPM) mapped reads normalize the number of reads that map to a

particular gene after correcting for sequencing depth and transcriptome composition bias

[26]. CPM is used for between-sample differential analysis to compare between the gene

expressions of the same gene in different samples. It is not suitable for within-sample gene

expression comparison because it does not adjust for the gene length. The CPM of a gene is

defined as the number of reads mapped to a gene divided by the total number of mapped

read (or library size) multiplied by 1000,000.

RPM

10⁶

(5.3)

5.3.5.4 Trimmed Mean of M-values

The Trimmed Mean of M-values (TMM) [27] is used by edgeR for between-sample dif-

ferential gene expression. It uses the relative gene expression of two samples: one is the

sample of interest (treated), and the other is the reference that we wish to use as a baseline

for comparison. The gene-wise log-fold change of gene g Mg is given as:

















log2

(5.4)